home *** CD-ROM | disk | FTP | other *** search
- Path: news-server.ncren.net!concert!ais!bruce
- From: bruce@ais.com
- Newsgroups: comp.dcom.modems
- Subject: Datacomm compression vs V.42bis compression (was Re: How come Z-Modem)
- Message-ID: <1996Mar26.100502.8753@ais>
- Date: 26 Mar 96 10:05:02 EST
- References: <4inro0$9mj@alcor.usc.edu> <4j7mp0$l33@sam.inforamp.net>
- Organization: Applied Information Systems, Chapel Hill, NC
-
- In article <4j7mp0$l33@sam.inforamp.net>, crs0794@inforamp.net (Geoffrey Welsh) writes:
- >
- > Oh, and yes... _full_ ZMODEM implementations include data compression, but
- > with data compression built in to any decent modem these days, that's not
- > very important.
-
- Relating to the subject of data compression inside a file transfer
- protocol, I have a question that I wonder if anyone has ever done
- the tests needed to come up with a definitive answer.
-
- Given a pair of modems that use a decent-sized V.42bis dictionary,
- using the common file transfer / data communications protocols (ZMODEM,
- Kermit, SLIP, PPP), can you achieve faster overall throughput by turning
- on compression in the protocol, or by turning it off and allowing the
- modem to do the compression by itself? Or does it matter much? Assume
- that DTE speed is much greater than connect speed so that you are not DTE
- speed limited. For the protocols that allow bidirectional traffic (SLIP
- and PPP), you should probably also limit the traffic to primarily one-way,
- with the return traffic being the minimum necessary to maintain the
- traffic flow. Also, the size of the V.42bis dictionary needed to be
- "decent" is likely to be another parameter in the equation, as is the
- type of data being send by the protocol (some data being more compressable
- than others).
-
- The point of the question is that, unlike compression algorithms such as
- the LZ algorithm used by GIF and ZIP, most file transfer and data comm
- protocols do compression by fairly simple "repeated byte elimination",
- that is, if you have a series of repeated bytes the algorithm will send
- these as something like:
-
- <REPEAT-INDICATOR> <REPEATED-BYTE> <REPEAT-COUNT>
-
- (typically 3 bytes), rather than as, for example, the single token that
- represents a string of bytes used by more sophisticated algorithms. Even
- if your file is fairly compressable using repeated byte elimination, you
- could find that the compression algorithm tends to defeat the dictionary
- used by LZ and V.42bis if there are a number of different combinations of
- <REPEATED-BYTE> and <REPEAT-COUNT> such that they tend to fill up the
- dictionary and crowd out potentially more useful strings. (Of course
- this is likely to require somewhat pathological data :->). Obviously
- you can also run into problems if the data contains a lot of the byte
- values used for the <REPEAT-INDICATOR> byte, so that you need to insert
- some kind of escape to change the normal meaning of that value, but let's
- assume that you're not sending random or pre-compressed data -- ie, that
- the data is text or executable or "normal" data files rather than ZIP or
- GIF files. In addition, the types of compression achieved by repeated-
- byte elimination are likely to be similar in their effect to what can be
- achieved by V.42bis, so that the usefulness of using both is not obvious.
-
- My _guess_ (and that's all that it is at this point) is that, assuming
- that you are not DTE-limited in some way, most of the time you'll find
- that it makes little or no detectable difference whether the DTE stream
- is compressed or not if the compression uses one of these simple schemes
- (formats such as ZIP _could_ achieve higher compression if they use a
- large enough dictionary). But has anyone actually run any tests on this?
-
- Bruce C. Wright
-